Putting hornets on the genomic map

Hornets are the largest of the social wasps, and are important regulators of insect populations in their native ranges. Hornets are also very successful as invasive species, with often devastating economic, ecological and societal effects. Understanding why these wasps are such successful invaders is critical to managing future introductions and minimising impact on native biodiversity. Critical to the management toolkit is a comprehensive genomic resource for these insects. Here we provide the annotated genomes for two hornets, Vespa crabro and Vespa velutina. We compare their genomes with those of other social Hymenoptera, including the northern giant hornet Vespa mandarinia. The three hornet genomes show evidence of selection pressure on genes associated with reproduction, which might facilitate the transition into invasive ranges. Vespa crabro has experienced positive selection on the highest number of genes, including those putatively associated with molecular binding and olfactory systems. Caste-specific brain transcriptomic analysis also revealed 133 differentially expressed genes, some of which are associated with olfactory functions. This report provides a spring-board for advancing our understanding of the evolution and ecology of hornets, and opens up opportunities for using molecular methods in the future management of both native and invasive populations of these over-looked insects.

. Comparison of hornet (Vespa) genome assemblies statistics with honeybee, fire ant and other social wasp genomes. The three Vespa assemblies (including our two assemblies with the star) with recent wasp genomes and one representative from each ant and bee group. All data in Supplementary Table ST4 www.nature.com/scientificreports/ invasion success, it is surprising that Vespa have not received more attention as subjects of genomic studies. It is therefore time to put Vespa genomes on the map and explore their potential.
Here we compare the genomes of three ecologically, economically and societally important hornets-the European hornet Vespa crabro, the yellow-legged Asian hornet Vespa velutina, and the northern giant hornet Vespa mandarinia 46 . Two of these genomes (V. crabro and V. velutina) were sequenced and annotated for the purposes of this study. We compare the genome compositions of the three Vespa species with each other and contrast them with other social insect genomes (Aim 1). We are specifically interested in comparing lineages of insects that have evolved superorganismality independently 47 , to explore losses and gains of genes related to social organisation, such as chemoreceptors 48,49 . We identified genes undergoing rapid evolution and family expansions (Aim 2) and identified specific differences among the three Vespa species, such as duplication events among genes involved in communication. These provide insights into the ecological differences among the three study species, for instance by describing species-specific olfactory systems. We identified signs of positive selection in Vespa relative to other social insects and determined whether they were enriched for any specific functionalities. We found little evidence that genes under selection may be involved in caste determination. Finally, we examined differential gene expression among castes in one species, V. crabro (Aim 3), presenting the first transcriptomic analyses of castes in a superorganismal wasp.

Materials and methods
Sample collection. For Aim 1, four colonies of V. crabro were collected in September 2017 from four different locations in their native range of southern England (see Supplementary Table ST2). Individual workers, gynes (non-mated queens), males and queens were flash frozen on dry ice or collected straight into RNAlater and stored at -80 °C thereafter. Samples from two colonies of V. velutina were collected in 2017 from Ventimiglia, Italy which is part of the invasive range 50 . Workers and gynes were collected alive and stored immediately in ethanol and − 20 °C thereafter. DNA and RNA library preparation and sequencing. DNA was extracted from whole bodies of two males of V. crabro using Qiagen DNeasy blood and tissue kit (Catalogue number: 69504; Hilden, Germany) following the manufacturers' protocol (details of this and all following protocols in Supplementary Information). One sample contributed to the two runs from a pair-end TruSeq Nano LT library (Cat #: FC-121-4001, Illumina, San Diego, CA, USA; INSDC accession numbers: SRR11213735 and SRR11213736) and the other to the Nextera  RNA was extracted from a range of tissues (from workers, gynes, queens and larvae) for gene annotation purposes. For Aim 3, we also sequenced brain RNA from two different castes-gynes and workers-of V. crabro. Using gynes rather than mature queens avoids the effects of matedness, age, overwintering, and reproductive maturity (developed eggs) on gene expression. We focused on measuring variation in gene expression related to behaviour between castes 47 ; thus, brains were dissected from individuals derived from three to four different nests (see Supplementary Table ST2), and total RNA was extracted from individuals following published methods 51 (RNeasy Mini Kit Cat # 74104, Qiagen). RNA extractions were checked for quality using TapeStation. Three to four brains were then pooled for each caste, ensuring that no single pool had more than one wasp from the same nest; this generated a total of four worker pooled samples and five gyne pooled samples for sequencing, representing respectively four and five biological replicates. We sequenced these nine pooled samples (150 bp paired-end, Illumina NovaSeq platform, by Novogene) and obtained at least 22 million reads per sample (Supplementary Table ST2).
Genome assembly and annotation. We assembled and annotated two new Vespa genomes. The genome of V. crabro was assembled with SOAPdenovo_v2.04 52,53 into contigs from one haploid male. The resulting assembly was screened for contaminants during INSDC submission (INSDC accession: JAITYU000000000). The genome of V. velutina (INSDC accession PRJEB46979) was assembled with the following process, palindromic read correction with pbclip, initial PacBio assembly generation with Falcon-unzip 54 , retained haplotig separation with purge_dups, Hi-C based scaffolding with SALSA2, Arrow polishing (from Pacific Biosciences, https:// github. com/ Pacifi cBio scien ces/ Genom icCon sensus), and short-read polishing using FreeBayes-called variants from 10× Genomics Chromium reads aligned with LongRanger. Chromosome-scale scaffolds confirmed by the Hi-C data have been named in order of size.
In order to compare genome compositions amongst Vespa and with other social insects (Aim 2), we annotated these two new genomes for structural and functional loci. We used RNAseq data combined with "ab initio" and comparative computational methods, including mapping wasp UniProt-derived proteins, to predict genes in the repeatmasked genomes of V. crabro and V. velutina (Supplementary Information; Supplementary Table ST3). We obtained genome and annotation files from all other species from INSDC (Supplementary Table ST4). We estimated the assemblies' statistics using BUSCO v5.1.2 55 and QUAST v5.0.2 56 . Additionally, we compared synteny between genomes (Supplementary Table ST5A) with D-genies using Minimap v.2 57,58 . Finally, to annotate Transposable Elements (TE), we used RepeatModeler v2.0.4 59 to generate a model of TEs for the V. velutina and V. crabro genomes separately. Consensus sequences were clustered and filtered using cd-hit v4.6.8 60 and BBmap v39.01 61 as recommended 62 . We then used RepeatMasker v4.1.0 63 to annotate TEs based on the model for that genome (Supplementary Table ST4. Orthogroup exploration for positive selection and duplication events. In order to explore patterns of gene evolution across the species (Aim 2), we ran OrthoFinder 2.5.4 64 on the longest isoform of each protein, first on eight Hymenoptera species (Vespa crabro, Vespa velutina, Vespa mandarinia, Vespula germanica, Vespula pensylvanica, Vespula vulgaris, Apis mellifera, Solenopsis invicta; Fig. 2a), then on seven Hymenoptera species (without the ant, S. invicta). We based our duplication events exploration on the resulting listed multiplecopy orthologues for each species and each internal node.
We kept 2685 resulting nearly-1:1 orthogroups (i.e. between one and three copies of the gene in any given species, and any given gene had to be present in a minimum of six species) for the six wasps and the bee. This is a more relaxed filtering method than strict single-copy orthologs, allowing more data points in our exploration 49,65 . We then aligned all species' sequences using PRANK v.151120 (gffread -x, PRANK options -f = paml -F -codon 66 ). We investigated evidence of positive selection being experienced in the past on a specific branch (branch-test; M0 (Model = 0, Nsites = 0); M2 (Model = 2, Nsites = 0)) with PAML v.l4.9j 67 using codeml with the phylogenetic tree from OrthoFinder of six vespid species (three Vespa and three Vespula species). To gain more insight on specific loci, we then ran branch-site models, focusing on potential loci on specific branches ("foreground") against other branches ("background") using codeml's dN/dS branch-site models of positive selection 68 . Briefly, we compared the ratio of non-synonymous mutations to synonymous mutations (ω) in nearly 1:1 single-copy orthogroups of six those vespid species and tested two models: neutral evolution in foreground branch (Model = 2, Nsites = 2; ω fixed at 1, null hypothesis) and positive selection in foreground branch (Model = 2, Nsites = 2; ω > 1, alternative hypothesis; Supplementary Table ST6). We ran likelihood ratio tests 67 and further tested for association with significance of log-likelihood test with chi square tests. Significant loci having experienced positive selection are those orthogroups with adjusted P value (Benjamini-Hochberg) below 0.05. We then assigned a description to each orthogroup using BLASTp v.2.10.0 69 72 and to obtain read count with featureCounts v1.6.4 73 . We assessed the quality of our data with a Principal Components Analysis of normalised read count, which showed samples to cluster by caste (Supplementary Figure SF7). We then conducted a differential gene expression analysis between worker and gyne read count using DESeq2 v.1.26.0 74 (log normalization; alpha cutoff = 0.05; Bonferoni adjustment). Similar to the positive selection analysis, we used BLASTp and TopGO to explore results.

Results and Discussion
Aim 1: Vespa genomes statistics. We sequenced and assembled the draft genomes of V. crabro and V.
velutina, using short-read and long-read data, respectively. We compare these assemblies to a publicly available assembly for V. mandarinia (Fig. 1), together with genomes from representatives of other aculeate wasps, ants and bees (Fig. 2a, Table 1 Figure SF10). This range is larger than the 10% calculated in the Polistine lineage 75 . Both species'   Figure SF8-SF9) fits within the expected range for insects. Interestingly, while all families appear to be larger in V. velutina, a larger proportion of Penelope elements are found in V. crabro (5.9%) compared to V. velutina (0.2%, which is within the range of recently published annotations of Polistes exclamans (0.16%) and Mischocyttarus mexicanus (0.02%) 75 ). An extensive survey of Hymenoptera TEs will provide more insights in these variations. In the pairwise assembly mapping between the three Vespa genomes, we find the highest contig similarity between V. mandarinia and V. velutina long-read genomes (88% of contigs are between 50 and 75% similar, Supplementary Table ST5A), while short-read V. crabro genome typically scores lower synteny score due to its fragmentation 34,57 . Interestingly, there is a large part of the V. crabro assembly that is not similar to the other two species' assembly: over 5% of the V. crabro genome does not map to the V. mandarinia genome nor the V. velutina genome (i.e. no match in contig similarity; Supplementary Figures SF4-SF6), but these regions are small (less than 2000 bp) and not in coding regions (i.e. not in the annotation file). Their sequences are not similar to those of the closest model organism Apis mellifera (Supplementary Table ST5B); they are most probably representing some repetitive elements. We additionally find some evidence of a one-off diagonal inversion in V. mandarinia (length: 1,152,777 bp; on original strand: locus NW_023395844.1, start 3,701,405, end 5,336,819) when mapped to V. velutina (on original strand: locus SUPER_22, start 2,372,742, end 4,003,498; Supplementary Figure SF6). This region has 96 annotated V. mandarinia genes and seems to be only inverted between V. mandarinia and V. velutina, hinting towards a V. mandarinia-specific rearrangement. Further improvements in assembly contiguity for V. crabro as well as further V. mandarinia population-level resequencing are needed to quantify and qualify both the unique region covering 5% of V. crabro genome and this potential inversion in V. mandarinia.

Aim 2: evidence of gene duplication and site-specific positive selection in Vespa. Gene family
expansions are thought to be associated with novel functions 78 and invertebrate invasion 79 . Target gene families include those involved in communication, odorant binding 49 and caste differentiation 80 . We thus compared the number of multiple copies of orthogroups between the three Vespa genomes. We focus on 16,913 orthogroups present in at least one species among ten chosen Hymenoptera representatives. Between lineages, we find more gene duplication events along the Vespa branch (25%; 5632 out of 22,976 events associated with 16,913 orthogroups) than along the Vespula branch (4%; 990 events; Supplementary Figure SF3). V. crabro has 1,187 duplicated orthogroups (7% of 16,913 orthogroups), including the highest proportion of species-specific gene family expansions: 77% of duplicated genes have two or more copies unique to this species (911 out of 1187). V. mandarinia has the highest number of duplicated orthogroups (2,177; 13% of 16,913 orthogroups), including 65% species-specific gene family expansions. V. velutina has 881 duplicated orthogroups (5% of 16,913 orthogroups), including 75% species-specific gene family expansions (Fig. 3a, Supplementary Table ST7). There are 28 genes that are duplicated in all three Vespa, some of which could be associated with the odorant or nervous systems (see notable examples of best BLASTp hits in Fig. 3b). V. crabro-specific gene duplications include sequences similar to notable Hymenoptera proteins. Those of particular interest include ten proteins from zinc finger family, a gene family known to also be duplicated in incipiently social Ceratina bees 80 ; odorant receptors which are commonly found in lineage-specific expansions in Hymenoptera evolution 49 and predicted to be duplicated in 80 invasive insect species 81 , and transposable element derived proteins which are involved in DNA-binding transcription factor activities and are thought to be associated with regulation of phenotypes via genome architecture variation 82 (Supplementary Table ST8).
Selection pressures associated with social phenotypic plasticity impact the molecular evolution of proteins 83 . We thus explored the rate of protein evolution (dN/dS) for each single-copy orthogroup in the six Vespa and Vespula species (branch models: Supplementary Tables ST8-ST16; branch-site models: Supplementary Tables  ST17-ST24). In branch-site models, V. crabro has 104 genes that showed evidence of having experienced positive selection (FDR 0.05, Fig. 3c, Supplementary Table ST17). From the best BLAST hits, notable loci include four zinc finger proteins (e.g. XP_006560154.1 zinc finger CCHC domain-containing protein 4)-a family known for its binding sites being more variable in social bees compared to solitary bees 84 and known for being included in an ant social supergene 85 ; an intraflagellar transport protein (XP_006560607.2)-known to be associated with insect olfactory sensilia used for communication 86 . In V. velutina, 57 genes showed evidence of positive selection (Supplementary Table ST21). Notable loci included: ELMO domain-containing protein C (XP_395400.7), found to be upregulated in a incipiently social bee when experimentally forced to increased brood provisioning 87 ; and senecionine N-oxygenase (XP_006571261.2), which is involved in detoxification and found upregulated in pesticide-sprayed beetle 88 . In Vespa mandarinia, 31 genes showed evidence of positive selection (Supplementary Table ST19) including deformed epidermal autoregulatory factor 1 (XP_395757.3), a transcription factor associated with honey bee egg laying behaviour 89 . We found only one locus (OG0002464) that had experienced positive selection across all three Vespa species: the sequence matches in BLAST Apis cerana's transmembrane protein-like (Fig. 3c). One locus-Spidroin-2-like precursor protein (NP_001314894.1)-was under selection in both V. crabro and V. mandarinia; this gene could be playing a role during pupa development 90 . Zinc finger CCHC domain-containing protein 4 (XP_006560154.1) was also under selection in both V. crabro and V. velutina.
When we compared these results with Vespula species, we found similar range of genes with evidence of positive selection (Supplementary Tables ST18, ST20, ST22). However, there was no overlap of genes with dN/ dS > 1 between the six vespine wasp species nor distinct vespine chromosomal hotspot for positive selection (Fig. 3c in blue).
We next tested for GO term enrichment among the genes under positive selection in each Vespa branch-site model ( Supplementary Tables ST25-ST30, Fig. 3d). There was no significant enrichment at the most stringent level (FDR 0.05). However, examination of terms with the lowest raw P values (P raw < 0.05) revealed functions associated with reproduction and morphological innovations: oogenesis (GO:0048477, hallmark of eusocial In summary, the Vespa lineage may have experienced positive selection on genes related to communication, reproduction, production/regulation of alternative phenotypes. Our comparative analyses on orthologous genes consistently highlighted V. crabro, with the highest number of genes under positive selection and associated GO Terms, and with the highest proportion of species-specific duplicated genes in the Vespa lineage. Thus, we explored gene expression between phenotypes in this species, for our last Aim. Aim 3: caste-associated differential gene expression in Vespa crabro. Analyses of the molecular basis of caste differentiation in the brains of social insects has provided important insights into how division of labour in group living societies evolves 94,95 . However, to date these analyses in wasps have been limited to species from simple societies. Here we provide the first analysis of brain transcriptomes among castes of a superorganismal wasp. We extracted RNA from 40 individual brains of V. crabro. These were sequenced as five pooled samples of gynes (i.e. each sample contains brains from four individuals) and four pooled samples of workers (i.e. each sample contains brains from three individuals). We detected 1171 differentially expressed genes between workers and gynes (FDR-adjusted P value < 0.05; no log 2 fold change filter; 648 upregulated, 523 downregulated, Fig. 4a). After filtering for log 2 fold change 1.5, 133 genes remained: 63 were upregulated and 70 were downregulated genes in gynes relative to workers (Fig. 4b, Supplementary Table ST31). These included sequences with BLAST match to Apis mellifera's zinc finger CCCH domain-containing protein 13-also found in V. mandarinia's queen transcriptome 96 -, and odorant receptor proteins, recurrent in all hymenopteran olfactory repertoires, including hornets 97 . Caste-enriched GO terms for the differentially expressed genes (FDR < 0.05, Supplementary Table ST32) included pheromone-related activity (GO:0016229, also upregulated in experimentally-isolated bumble- We examined whether any of the differentially expressed genes in V. crabro had experienced positive selection or duplication events. A single locus was found in both the dataset of duplicated genes (duplicated in OrthoFinder analysis; aim 2) and differentially expressed between castes (Vcabro1a000853P1), which shares sequence similarity with heterogeneous nuclear ribonucleoprotein H in Apis mellifera (Supplementary Table ST31). This locus has previously been identified as caste-biased in Cape honey bees, where it is implicated in alternative splicing leading to worker ovary activation 100 . It is also known in several bee species to recognise epigenetic modifications and is part of the post-transcriptional RNA modification machinery 101 .

Conclusion
In this study, we sequenced, assembled and annotated two new Vespa genomes; we also generated gene annotation files, available for the community. We then conducted an analysis of gene evolution for vespine wasps, comparing with other social Hymenoptera. We detected high levels of gene duplications among genes associated with reproduction, communication and production of phenotypic variation in the Vespa species; for instance, two genes related to odorant receptors are duplicated in V. velutina. Furthermore, we found 104 genes under positive selection in V. crabro, including some associated with reproduction such as oogenesis. Finally, we provide the first analyses of caste-specific brain gene expression for a superorganismal wasp, comparing transcriptomes derived from adult workers and unmated female reproductives (gynes).
Our analyses of duplications, positive selection and differentially expressed genes detected associations with the olfactory system. Communication is behaviourally key to all social insects for reproduction 102 , colony health 103 and species' survival, including invasive species 104 . This is supported by evolutionary variation in chemosensory repertoire 105 in social and invasive insects, for instance, loss of gustatory receptors in caterpillar pests, and duplicated chemosensory genes in invasive insects. Genes belonging to the zinc finger family were also found repeatedly in our analyses. This is one of the largest families of transcription factors, and so may not be so surprising. However, we suggest that this is a biologically interesting result: zinc finger domains found to be under selection in the Vespa lineage (duplicated, dN/dS, differentially expressed) may have experienced lineage-specific expansions and may be related to new gene regulatory functions 106,107 . Future work should focus on conducting functional tests, e.g. CRISPR to demonstrate functionality.
Accordingly, the genomic resources and analyses we provide secures the hornets a place in the ever-growing world of genome sequencing and analyses, and adds significantly to what remains an underrepresented group of insects in the genomics world-the aculeate wasps 7 . As an example of the exponential growth of genomics datasets leading to population genetics and pangenome analyses, since our analyses, the Darwin Tree of Life initiative made available a new Vespa crabro assembly. These new datasets will improve the scope of comparative analyses in the Hymenoptera by, for instance, providing a more complete survey of non-coding regions and olfactory receptors related to the evolution of social organisation in insects. Finally, our datasets hint at candidate genes with important gene functions-namely those involved in regulation of reproduction and communication, and open up new opportunities to explore the molecular mechanisms underpinning key ecological and physiological traits, such as those associated with invasive species and venom components in invasive Vespula species 76 . www.nature.com/scientificreports/ Future work should focus on population genetic studies to further explore selection pressures associated with social evolution and anthropogenic impacts (e.g. high F ST regions indicating directional selection and Tajima's D indicating balancing selection within species, especially in the context of comparison between invasive and native ranges). Furthermore, it would be interesting to conduct a comprehensive TE analysis across wasps to assess potential functions related to phenotypic plasticity, for instance when comparing native and invasive species [108][109][110] . Finally, we encourage single-cell brain expression analyses 111 , and analysis of multiple tissues from across the diversity of phenotypes 112 to further explore differential gene expression in Vespa crabro. Such approaches will help build a better understanding of these ecological and economically important insects.